phase 5 News

A Statement on the matters of the advantages and disadvantages of the United Memory Architecture

or in other words, "Why develop something new, if you can just buy it instead?

Since the release of the very basic system specifications of our A\BOXsystem and the "still under development" CAIPIRINHA-Chip the public has beendiscussing the sense or nonsense of this development. Most of these discussion topics were why a Unified Memory Architecture should be used and if the standard designs with available components wouldn't make more sense.

One reason that is often brought against the CAIPIRINHA-Concept with UMA-Design is the Memory-Bandwidth that is used for certain system functions, the Processor-Access and the Video-Outputs. Therefore there have been many hot discussions where the critics of the UMA-Design have shown extremely simple examples to point out possible disadvantages of the UMA-Design like; "1600x1200Pixel in 24 bit by 75hz=432MB/sec permanent usage in addition to a second video output, 3D-Calculating with tons of textures, multichannel-audio plus more is all it takes to slow the CPU access down." With this reasoning some like to support the concept of a separated Bus and/or graphic on PCI or AGP. Other arguments try to make believe the future security of other cheaper modular solutions or bus extensions. In the following we will comment these points, though with a slight grin on our face.

1.) The everyday architecture with Memory and graphics, on the PCI for example, have much lower bandwidths (that do not add up together of course). By deviding the memory there is the need to transfer the data from the main memory into the videoboard's ram. Here are three examples.

- A PC-Processor calculates a animated 3D-scenario. Therefore it reads ten thousand's of coordinate values for every screen, makes heavy calculations and writes the data back into the main memory. After that the data must be read out of the main memory again to be sorted properly and then sent over the PCI bus to the 3D-Graphicsboard. Since the Scenario is rather complex and the GFXBoard unfortunately has only 2 or 4 Megs of ram, loads of new textures have to be transferred into the texture memory where the 3D-chip needs them to calculate the polygons. Another way is to just do very simple scenarios with textures small enough that fit into 1 meg of reserved memory permanently on the GFXBoard.. a real High-end solution.

-A Video digitizer writes its real-time picture data into the main memory of the PC - since that's where they're supposed to be edited. To show these as an animated window they must be again copied into the graphics board memory - 25 times a second about 1 meg of data equals 25 Mbyte/sec or about the half of the actual usable bandwidth of many PCI-Systems. What a pity that the other half is already being used by the video digitizer ...

- A 4000x4000 Pixel x 24bit (=48 Mbyte) screen is displayed on the GFXBoard with a resolution of 1280x1024 and you would like to scroll around on this screen (panning). Ofcourse that's possible on a PC Standard architecture, disregarding the fact that the PCI-Bus is totally "jammed", because the Processor is too busy transferring the Screendata from the main memory. Anyway - the fact that the databus is "Jammed" doesn't really matter, because the CPU couldn't use the free bandwidth for real calculations since it's busy transferring data.

On all examples - the list could go on and on - the UMA-Design has obvious advantages as often the transferring of huge masses of data becomes obsolete since they are already there where all function units may have access to them - in a unified memory. Under usage of UMA/DLRP combination (see below), display data that may lay at any address in the memory can be displayed on any screen position without the need of making use of Bandwidth and CPU to copy them into a "Videomemory". The same goes for other data for example 3D-Coordinates, Textures, Sounddata and much more. At last there we can only say this: A well implemented UMA-Design does not only offer obviously more memory bandwidth than nowadays (and future) standard-solutions, but also strongly reduces the need for more memory bandwidth and so offers more power and resources for High-end applications.

2.) The simple bandwidth calculation depends on the conventional design of Graphics boards, which need the Picture data to be in one piece on a single block in the memory. Therefore the amount of data and color depth is always at the maximum, which is a totally senseless concept. The advanced technic of the Display List RISC Processor (DLRP) of the CAIPIRINHA-Chips offers a completely different concept where the displayed screen must not be in the memory in this form. Here the - while flexible colordepth - Dataflow is much less. A single DLRP command may for example instruct 100 Pixels in a row with a surten color to be displayed. In a system - as possible with CAIPIRINHA - where you may have 24bit-windows in whatever form and size, the user may choose if he wants to use memory and bandwidth resources for a 24bit background image; if he chooses a color reduced or even a one color or grid background, he obviously saves resources. The display of a 1600-pixel line could, roughly translated into human language, look like the following DLRP sequence.

{

; This is Background

Show 10 Pixel with 1 Byte since Cache address $xxxxxxxx

; Here a line of the Scrollbar is being displayed from the cache

Show 700 Pixel RGBA with 4 Byte since address $yyyyyyyyy

; 700 Pixel of a 24 bit picture

Show 350 Pixel Palette with 1Byte since address $zzzzzzz

; Here is a Window infront of the Picture e.g Controlpanel with 256 colors displayed.

Show 312 Pixel RGBA 128,128,256,0

; Here's the Background again to the right edge

}

In this example it takes about 3150 bytes for one line plus a few instructions from the main memory while on a "Traditional" 1600 Pixel 24Bit-line there would have been 6.400 bytes, this equals the total screen display resulting in a reduction of the highest needed bandwidth of about 432 MBs down to approximately 214 MB/s. As explained in this example, and other similar cases (Which makes the main part of the everyday applications), by intelligent programming and configuration of the combination of UMA and DLRP a better usage of system resources is achieved. This is, as we believe, a preferred concept to that of other common resource wasting GUI based systems which demand new processor generations on a regular basis.

3.) Many critics of the A\BOX concept or of the UMA design of our CAIPIRINHA prefer to argue with high end demands to the system performance and the graphics display where they - not knowing of the CAIPIRINHA-Design concept - assume possible bottlenecks and limitations, and then compare it with the performance of current affordable Graphicboards with standard components. Herefore they like to take the example of a complex 3d-display using the highest resolution and refreshrate. Disregarding the praising of the everyday standard concepts there still are various facts left.

- Today's PCI-Graphicboards already can't stand up to the needs of Multimedia and 3D applications, the highly praised PCI-Bus is already at the limit of it's bandwidth. This doesn't matter since the industries already have the solution at hand with AGP and almost 400mb/s for really fast 3D-applications. With this solution the demand for new graphicboards is given. In addition to that you can also sell new motherboards with AGP-Port to the users .. Its going to be really interesting in one or two years when the boundaries of AGP are reached by the definition of marketing strategies who propagate a new and more efficient software generation which will unpredictably demand a new hardware generation then.

- Current PCI 3D-Graphicboards at affordable prices are rarely able to display higher resolutions than 1280x1024 in 24Bit - even the new EDO-Ram based cards. For a better results you have to buy a High-end graphic board with VRAM or WRAM. These are the only cards that may at least *slightly* be compared with the A\BOX system and that only in terms of resolution but not with other features.

- Many Graphicboards with chips from leading manufacturers already offer fast 3D-Graphic - using low resolutions and reduced colordepth. In other words: Many of the 3D Engines do not use the highest possible resolution that the chips support, but mostly only 800x600x16 Bit (Some 3D Engines cant manage 3D in 24Bit at all). These resolutions could easily be done in 150hz refreshrate on the CAIPIRINHA System while only having less than 15% usage of the bandwidth. Actually this has nothing to do with REAL 3D graphics (the same goes for those neat looking and fast consoles); for that, most current systems are not even equipped yet.

- For a more realistic point of view you will have to keep the limitations of the current system design in mind. The often mentioned theoretical peak performances of standard systems are further away from reality than the CAIPIRINHA design from its theoretical maximum performance.

Again we must remind that even the industries find the PCI Bus outdated and that in future developments will be replaced by AGP for example in Power PC or x86 based systems which will have a speed increase by factor 3. That will still not overcome the boundaries nor will it even come close to the performance of the fast UMA-Design or the CAIPIRINHA.

4.) As a reason against UMA some have said that the CPU might have a limited memory access. A bandwidth of 400 MB/s with a bus speed of 50MHZ x 8 byte (64bit bus) was estimated. On a CAIPIRINHA System with a theoretical CPU-Bus performance of 100MHZ (as soon as the PowerPC processors are that fast on the bus, of which they are not yet capable) the needed bandwidth may even be estimated at 800 MB/s. Compared to the currently estimated bandwidth of 1.600 MB/s of the UMA-Memory it was put off as a theoretical maximum performance that practically could never be reached, which is ofcourse true. This fact counts much more on the theoretical bandwidth of 400 or 800 MB/s, since even the fastest PowerPC-processors on the market can not handle such data masses in reasonable applications (and since the simple but performance eating job of data transfer is done by CAIPIRHINA the CPU may be used for more valuable stuff)

Further than that the current PC system controllers have compared to the UMA design of CAIPIRINHA and referring to test results of various independent magazines a actual Main memory access less than 100 MB/s which goes for the fastest Pentium and Pentium pro systems. But even the standard controller MPC106 by Motorola (a combination of memory / cache / PCI-Bus controller for PowerPC machines) with 60ns ram and a 64bit databus does not exceed a maximum of 133 MB/s (which is about the performance of a zero waitstate ram controller with 16mhz) and will actually be much slower in reality. Even if the PowerPC processor would only receive data with a speed of 200-300 MB/s by CAIPIRINHA due to extreme heavy system activity, this would still outrun any current standard design on the market by all means (even those which will be available in 1997).

5.) Another argument against the high integration required by the CAIPIRINHA concept is the expandability. People like to criticize that the controller (including graphics and audio) is integrated on the motherboard and not exchangeable over a standard bus system (which currently is not possible due to the unavailability of a standardized bus-system that can deliver the performance required). Besides that there is still the fact that the CAIPIRINHA design (due to be finished in 1997) will use the available technologies to its limit, such as 100mhz srams which have been available for 2 years but are just now being implemented, and a 100MHZ CPU BUS, that no processor at present can push to its maximum. Due to the unique and innovative design CAIPIRINHA will offer years of leading performance. This can not be expected of many current modular systems. One who today for example buys a PCI graphics card invests in a quickly outdated technology. where as the next generation of boards with faster AGP graphics requires a new generation of motherboards for PowerPC and x86 systems. Meaning that the user must change the graphics card and motherboard including all controllers. But these next generation motherboards with AGP only enlarges the bottleneck from 132 MB/s to approximately 400 MB/s still dealing with an expansion limitation what leaves these systems with a limited future security. Other concepts where the processor is being used as a module together with memory and cache (which by the way is similar to the Accelerator concept such as the CYBERSTORM that use onboard memory due to the awful slow memory design of the A4000) costs the user much larger amount of money for upgrading since that usually includes the purchase of a new processor, cache and systemcontroller plus new sockets for memory and cache modules. The sense of these extra costs is more than questionable since if you want to upgrade the performance to a more advanced cache and memory design, such as SDRAM it is much likely that you will have to replace the cache and memory modules as well. As you see, it still is not yet proven by how far these modular concepts will be sufficient and up to date in the future.